Distance based Clustering for Categorical Data
نویسندگان
چکیده
Learning distances from categorical attributes is a very useful data mining task that allows to perform distance-based techniques, such as clustering and classification by similarity. In this article we propose a new context-based similarity measure that learns distances between the values of a categorical attribute (DILCA DIstance Learning of Categorical Attributes). We couple our similarity measure with a famous hierarchical distance-based clustering algorithm (Ward’s hierarchical clustering) and compare the results with the results obtained from methods of the state of the art for this research field.
منابع مشابه
خوشهبندی خودکار دادههای مختلط با استفاده از الگوریتم ژنتیک
In the real world clustering problems, it is often encountered to perform cluster analysis on data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. In addition, traditional methods, for example, the K-means algorithm, usually ask the user to provide the number of clusters. In this...
متن کاملUsing Categorical Attributes for Clustering
The traditional clustering algorithms focused on clustering numeric data by exploiting the inherent geometric properties of the dataset for calculating distance functions between the points to be clustered. The distance based approach did not fit into clustering real life data containing categorical values. The focus of research then shifted to clustering such data and various categorical clust...
متن کاملارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها
Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...
متن کاملContext-Based Distance Learning for Categorical Data Clustering
Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of the same categorical attribute, since they are not ordered. In this paper, we propose a method to learn a context-based distance for categorical attributes. The key intuition of this work is that the d...
متن کاملThe "Best K" for Entropy-based Categorical Data Clustering
With the growing demand on cluster analysis for categorical data, a handful of categorical clustering algorithms have been developed. Surprisingly, to our knowledge, none has satisfactorily addressed the important problem for categorical clustering – how can we determine the best K number of clusters for a categorical dataset? Since categorical data does not have the inherent distance function ...
متن کامل